Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
upwork.com 🟡 2026-05-12
🔹 Automated Data Extraction from HTML and PDF Files for Taekwondo Competitions
👤 Client: 🇷🇸 Serbia Member since 2023-02-11
💰 Price: ****
🚩 Problem: Need a web application to automate data extraction from various file types (HTML tables, PDF schedules, and bracket results) for consistent and accurate JSON output.
📦 Existing: Not specified
Specifications:
[Target] Web Application with File Upload Interface
[Method] AI-Powered Data Extraction
[UI/UX] User-friendly interface for file upload and data preview
[Stack] Frontend: React.js, Backend: Node.js, Database: MongoDB, API Integration: Claude/Gemini
[Security] Secure file handling, encryption of sensitive data
[Format] JSON output with structured data
Workflow:
1. Develop a web application with a user-friendly interface for uploading HTML and PDF files.
2. Implement AI-powered extraction for HTML tables using regular expressions or XML parsing libraries.
3. Use Optical Character Recognition (OCR) to convert PDFs into images, then extract data from these images using AI models.
4. Integrate Claude/Gemini API for enhanced OCR and data extraction consistency.
5. Ensure data is structured in JSON format with proper attributes like competitor details, schedule, and results.
6. Implement validation checks to ensure accuracy of extracted data.
7. Provide a preview feature allowing users to review the extracted data before saving it.